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(57) ABSTRACT 

A method for analyzing traffic data generated by a plurality 
of web servers, which host a single web site. The site is 
mirrored on each server. A traflSc data hit is generated 
responsive to each access of one of the servers. The hit 
includes data representing the time of the access. Each data 
hit is stored in a log file on the server accessed. The 
first-stored data hit is read from each server. Each of the read 
data bits are compared, and the oldest data hit is passed to 
a log file analyzer. The next-stored data hit is read from the 
server from which the passed data hit was read, and a second 
comparison is performed on the read data hits, with the 
oldest data hit being passed to the log file analyzer. This 
process continues until all of the data hits are read, 
compared, and passed lo the log file analyzer. This results in 
passing aU of the data hits to the log file analyzer in the 
chronological order in which the hits were generated. 
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SYSTEM AND METHOD FOR ANALYZING 
WEB-SERVER LOG FILES 

BACKGROUND OF THE INVENTION 

This invention relates generally to web-server trafiBc data 
analysis and more particularly to a system and method for 
analyzing web-server log files. 

The worldwide web (hereinafter "web") is rapidly becom- 
ing one of the most important publishing mediums today. 
The reason is simple: web servers interconnected via the 
Internet provide access to a potentially worldwide audience 
with a minimal investment in time and resources in building 
a web site. The web server makes available for retrieval and 
posting a wide range of media in a variety of formats, 
including audio, video and traditional text and graphics. And 
the ease of creating a web site makes reaching this world- 
wide audience a reality for all types of users, from 
corporations, to startup companies, to organizations and 
individuals. 

Unlike other forms of media, a web site is interactive and 
the web server can passively gather access information 
about each user by observing and logging the traffic data 
packets exchanged between the web server and the user. 
Important fads about the users can be determined directly or 
inferentially by analyzing the traffic data and the context of 
the "hit." Moreover, traflSc data collected over a period of 
time can yield statistical information, such as the number of 
users visiting the site each day, what countries, states or 
cities the users connect from, and the most active day or hour 
of the week. Such statistical information is useful in tailoring 
marketing or managerial strategies to better match the 
apparent needs of the audience. Each hit is also encoded 
with the date and time of the access. Because the statistical 
information of interest is virtually all related to time periods, 
accurately tracking the time of each hit is critical. 

To optimize use of this statistical information, web server 
traffic analysis must be timely. However, it is not unusual for 
a web server to process thousands of users daily. The 
resulting access information recorded by the web server 
amounts to megabytes of traffic data. Some web servers 
generate gigabytes of daily traffic data. Analyzing the traffic 
data for even a single day to identify trends or generate 
statistics is computationally intensive and time-consuming. 
Moreover, the processing time needed to analyze the traffic 
data for several days, weeks or months increases linearly as 
the time frame of interest increases. 

The problem of performing efficient and timely traffic 
analysis is not unique to web servers. Rather, traffic data 
analysis is possible whenever traffic data is observable and 
can be recorded in a uniform manner, such as in a distributed 
database, client-server system or other remote access envi- 
ronment. 

Some web servers are so busy, i.e., handle so much traffic, 
that they require multiple servers to handle all of the traffic. 
Other users may need to employ multiple servers because of 
the large size of the web site. Critical sites, i.e., ones that 
cannot afford to be down because of a problem with a server, 
may also choose to deploy their site on multiple servers. 
Such multiple servers are sometimes referred to as a server 
farm. Server farms provide high bandwidth reliable access to 
web sites. 

There are several topologies that may be used in a server 
farm, but the most important ones divide the farm into 
clusters of servers. The web site is mirrored on each server 
within the cluster. Special hardware receives all of the traffic 
to the web site and distributes each hit to one of the servers. 
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Some systems provide accurate load balancing in that all of 
the hits are rotated in sequence among each of the servers. 
But others assign each hit from a new source to a server, and 
further access to the site from that source is directed to the 
5 assigned server. This is accomplished by assigning a prede- 
termined time period, for example 30 minutes, during which 
all future access from the same source is considered to be 
part of a single session from that source. As described 
further below, the latter approach permits some log-file 
analysis, which is not possible using the load-balancing 
technique. 

Server farms, although providing load balancing and 
redundancy, present problems in analyzing the log files 
generated by the servers. Prior art systems for analyzing 
web-server log files can handle multiple log files, but these 
files are consecutively generated, i.e., the data packets 
within each log file are in chronological order and the log 
files themselves correspond to time periods containing data 
packets from within the periods. In other words, the log files 
are also consecutively generated. Log files on servers in a 
20 server farm, however, are concurrently generated. Each log 
file covers or overlaps the same time period. On server farms 
that rotate the hits among each server, log file analysis 
programs do not generate useful information. Brule force 
solutions are possible, such as sorting all of the log files and 
25 creating a new single file, or copying all of the hits from each 
log file to a large database, which can sort and analyze the 
data. These solutions have severe drawbacks: they are 
computationally intensive, they require creation of large 
new files, and they are done only after log files are complete, 
i.e., not on the fly while the log file is still being populated. 

Server farms that assign hits from a new source to a single 
user can nm prior art log analysis programs on each server 
and sum the results. This, however, is not completely 
accurate and is disadvantageous because it requires genera- 
tion of separate reports that must each be consulted or 
^ further manipulated to obtain information that apphes to the 
entire server farm. 

There is consequently a need for a system and method for 
analyzing web-server log files that are concurrently 
generated, such as those generated by a server farm. 
^ There is a further need for such a system and method that 
can analyze the log files substantially in real time. 

lliere is still a further need for such a system that can 
analyze the log files without generating new large files and 
without the need for substantial additional computing power. 

ITiere is also a need for such a system that can analyze log 
files whether they are concurrently or consecutively gener- 
ated. 

SUMMARY OF THE INVENTION 

50 The present invention comprises a method for analyzing 
log files containing a pluraUty of data packets in sequence 
comprising: (a) selecting the first data packet in each log file; 
(b) comparing the selected data packets; (c) passing the 
oldest of the selected data packets to a log file analyzer; (d) 

J J selecting the next data packet in the log file in which the 
passed data packet was selected; and (e) repeating steps (b) 
through (d) until all of the data packets in the log files are 
passed. 

The foregoing and other features and advantages of the 
invention will become more readily apparent from the 
following detailed description of a preferred embodiment of 
the invention, which proceeds with reference to the accom- 
panying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

65 FIG. 1 is a functional block diagram of a prior art system 
for analyzing traffic data in a distributed computing envi- 
ronment according to the present invention. 
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FIG. 2 is a flow diagram of a prior art method for municatioos Server software. Pentium, Miaosoft, Windows, 

analyzing iraflBc data in a distributed computing environ- WindoA^'s NT, Unix, Netscape and Netscape Communica- 

meni according to the present invention using the system of tions Server are trademarks of their respective owners. 

FIG. 1. However, other server 10 configurations varying in 

HG. 3A shows a prior art format used in storing a "hit" 5 hardware, such as DOS-compatible, Apple Macintosh, Sun 

of traflSc data received by the server of FIG. 1. Workstation and other platforms, in operating systems, such 

in i_ L r i rr « j as MS-DOS, Unix and othcrs, and iu web softwarc are also 

FIG. 3B shows, by way of example, a hit of formatted ... ... . * . f^^^. j 

» n: J * • J u .u c T>ir^ i possiblc. Apple, Macmtosh, Sun and MS-DOS are trade- 

trafittc data received by the server of FIG. 1. t r J^- • 

^ marks of their respective owners. 

FIG 4 is a schematic diagram of a server farm, including 2 is a flow diagram of a method 20 for analyzing 

muhiple servers hke that shown and described in FIG. 1. ^^^^ ^^^^ ^ distributed computing environment accord- 

FIG. 5 IS a schematic diagram illustrating the operation of jng to the present invention using the system of FIG, 1. Its 

the server farm of FIG. 4. purpose is to continuously collect and summarize access 

FIG. 6 is a schematic diagram illustrating the present information from traflic data hits 11 while allowing 
invention implemented in the server farm of FIG. 4. 15 on-demand, ad hoc analyses. The method 20 C9nsists of two 

FIG. 7 is a schematic diagram similar to FIG. 6, but routines. Access information is collected from irafiBc data 

illustrating the present invention operating on consecutive hits 11 and summarized by the server 10 into analysis results 

log files. 18A-C (block 21). The access information is separately 

FIG. 8 is a flow diagram of a routine for implementing the analyzed for generating the summaries 19A-C which idcn- 

presenl invention. 2° trends, statistics and other information (block 22). The 

collection and summarizing of the access information (block 

DETAILED DESCRIPTION OF THE 21) is performed continuously by the server 10 while the 

PREFERRED EMBODIMENT analysis of the access information (block 22) is performed 

HG. 1 is a functional block diagram of a prior art system °° ^ ^^^^ ^^^^ ^^^^ ^ ^P^^^^^ 
for analyzing traffic data in a distributed computing envi- ^5 workstation (not shown) 

ronmeni 9. It is more fuUy described in "WebTrends Instal- "^^^^"^ ^0 is preferably miplemented as a computer 

lation and User Guide," version 2.2, October 1996, and in P^^S^^"^ cxcailcd by the server 10 and embodied in a 

U.S. patent appUcation Ser. No. 08/801,707, now U.S. Pat. ^^^^^S^ compnsmg computer-readable code. In the 

No, 6,112,238, the disclosures of which are incorporated described embodiment, the method 20 is written in the C 

herein by reference. WebTrends is a trademark of Webtrends programramg language, although other programmmg lan- 

Corporation, Portland, Oreg. S^^S^ ^^l^^^^y suitable. It operates in a Microsoft 

. A u -1 t . J ■ * Windows environment and can analyze Common Log File, 

A server 10 provides web site and related services to u a i c ^ a - « i n r . r 

jS c 1 *u . Combined Log tile and proprietary log file formats from 

remote users. By way of example, the remote users can ... . j j l 1 ^^ j 

iA c . * * 1-1 industry standard web servers, such as those licensed by 

access the server 10 from a remote computer system 12 xr kt^c a r^^n n «/ uc. r\ * j / 

mlcrconnected with the server 10 over a network connection ^^^%T'J^ \ n f pL^lr A / ' 

^- . .u f . * • * 1 J- 1 / C-Buuder, Microsoft, Oracle, EMWAC, and Other Wmdows 

13, such as the Internet or an mtranetwork, a dial up (or - . Kn-nc n • * / uTir i_ t-u 

. ,v , iA A' * fA A' * A\ 3.x, Windows NT 95, Unix and Macmtosh Web servers. The 

point-to-point) connection 14 or a du-ect (dedicated) con- , . i. ^oa^t- i. . j • 

i-wu * r . analysis results 18A-C can be stored in a proprietary or 

nection 17. Other types of remote access connections are * j j j . ^ / i_ • t\ L oV.t 

standard database 16 (shown m FIG. 1), such as SQL. 

also possiDJe. BTRIEVE, ORACLE, INFORMIX and others. The method 

Each access by a remote user to the server 10 results in a 20 uses the analysis results 18A-<: of traffic data hits 11 as 

"hit • of raw traffic data 11. Hie format used in storing each ^^n.^j^j into the log file 15 or database 16 for building 

traffic data hit 11 and an example of a traffic data hit 11 are ^^^^^-^^^^ geographic, demographic and other summaries 

descnbed below with reference to FIGS. 3A and 3B, respec- ^^^^ ^^^^ j^^j^^ y^^^^ ^ ^^^^^ summaries 

lively. The server 10 preferably stores each traffic data hit 11 45 19^,^ possible, 
in a log file 15, although a database 16 or other storage 
structure can be used. 

To analyze the traffic data, the server 10 examines each 

traffic data hit 11 and stores the access information obtained ^^cr Profile by Regions General statistics Ubie 
from the traffic data as analysis results 18A-<:. Five sources 50 En^^egef i^'k^rpa^"* ^"^^ 

of traffic data 11 (remote system 12, dial-up connection 14, single Access Pages Ibp Paths Through Site 

log file 15, database 16 and direct connection 17) are shown. Advertising views Advertising aides 

Other sources are also possible. The traffic data hits 11 can Advertising views and aicks Most Downloaded FUes 

originate from any single source or from a combination of ^°'} Organizations Most AcUve countries 

° ^ . rrr J i_- Activity Summary by Day ot Week Activity Summary by Day 
these sources. While the server 10 receives traffic data hits 55 Activity Summary by Hour of the I>ay Activity Summary Uvel by 

11 continuously, separate sets of analysis results 18A-C are Hours of the Day 

stored for each discrete reporting period, called a time slice. ^'^^ s^wtx statistics and Analysis Client Errors 

The analysis results 18A-C are used for generating summa- I°P. ^7'^ ^""V.^"^""- a h 

. * * Acuvity by Organization Type Top Directones Accessed 

nes 19A-C of the access mformation. Top Referring Sites Top Referring URLs 

In the described embodiment, the server 10 is typically an 60 Top Browsers Netscape Browsers 

Intel Pentium-based computer system equipped with a ij?'*^^^^^^*^'" visiting Spiders 

processor, memory, input/output interfaces, a network °^ ° 
interface, a secondary storage device and a user interface, 

preferably such as a keyboard and display. The server 10 In addition, the analysis results 18A-C can be used for 
typically operates imder the control of either the Microsoft 65 automatically producing reports and summaries which 

Windows NT or Unix operating systems and executes either include statistical information and graphs showing, by way 

Microsoft Internet Information Server or NetScape Com- of example, user activity by market, interest level in specific 
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web pages or ser\ioes, which products are mosi popular, 
whether a visitor has a local, national or international origin 
and similar informaiion. In the described embodiment, the 
summaries 19A-C can be generated as reports in a variety 
of formats. These formats include hypertext markup Ian- 5 
guagc (HTML) files compatible with the majority of popular 
web browsers, proprietary file formats for use with word 
processing, spreadsheet, database and other programs, such 
as Microsoft Word, Microsoft Excel, ASCII files and various 
other formats. Word and Excel are trademarks of Microsoft lo 
Corporation, Redmond, Wash. 

FIG. 3A shows a format used in storing a "hit" of raw 
tralEc data 11 received by the server of FIG. 1. A raw traffic 
data hit U is not in the format shown in FIG, 3A. Rather, the 
contents of each field in the format is determined fi-om the is 
data packets exchanged between the server 10 and the 
source of the traffic data hit 11 and the information pulled 
from the data packets is stored into a data record using the 
formal of FIG. 3 A prior to being stored in the log file 15 
(shown in FIG. 1) or processed. 

Each traffic data hit 11 is a formatted string of ASQI data. 
The format is based on the standard log file format devel- 
oped by the National Computer Security Association 
(NCSA), the standard logging format used by most web 
servers. The format consists of seven fields as follows: 



Field Name Description 



Field Name 



Descrqjtion 



Referring Site (37): URL used to obtain web site information for 

performing the "hit." 
Agent (38): Browser version, including make, model or version 

number and opeiBiing system. 
Cookie (39): Unique identifier pemiissively used to identify a 

pariicular user. 



Other formats of traffic data hits 11 are also possible, 
including proprietary formats containing additional fields, 
such as time to transmit, type of service operation and 
others. Moreover, modifications and additions to the formats 
of raw traffic data hits 11 are constantly occurring and the 
extensions required by the present invention lo handle such 
variations of the formats would be known to one skilled in 
the art. 

FIG. 3B shows, by way of example, a "hit" of raw traffic 
data received by the server of FIG. 1. The user address 30 



20 



25 



30 



35 



User Address tntemet protocol (IP) address or domain name of the 
(30): user accessing the site. 

RFC931 (31): Obsolete field usually left blank, but increasingly used 
by many web servers to store the host domain name 
for multi-homed log flics. 

User Exchanges the user name if required for access to the 

Authentication web site. 

(32) : 

Date/Iune Date and time of the access and the time oflset from 

(33) : GMT. 

Request (34): Either GET (a page request) or POST (a form 

submission) command. 
Return Code Return status of the request which specifies whether the 

(35) : transfer was success fill. 

Transfer Size Number of bytes transferred for the file request, that is, 

(36) : the file size. 



In addition, three optional fields can be employed as 45 
follows: 



50 



55 



60 



65 



field is "tarpon.gulf.nel" indicating the user originates from 
a domain named "gulf, net"' residing on a machine called 
"tarpon." The RFC931 31 and user authorization 32 fields 
are "-" indicating blank entries. The Date/Time 33 field is 
"Jan. 12, 1996:20:38:17+0000" indicating an access on Jan. 
12, 1996 at 8:38:17 pm GMT. The Request 34 field is 
"GET/general.htm HTTP/1.0" indicating the user requested 
the "general.htm" page. The Return Code 35 and Transfer 
Size 36 fields are 200 and 3599, respectively, indicating a 
successful transfer of 3599 bytes. 

Turning now to FIG. 4, indicated generally at 40 is a 
server farm constructed in accordance with the present 
invention. Included therein are two server clusters 42, 44, 
each of which includes servers 46, 48, 50 and servers 50, 52, 
54, respectively. Each of the servers in clusters 42, 44 are 
substantially identical to server 10 in FIG. 1. In the present 
embodiment, server cluster 42 hosts a first web site, which 
is mirrored on each of the servers therein, at a single 
identified Internet Protocol (IP) address. The servers in 
cluster 44 host a second web site, which is mirrored on each 
of the servers therein, at a second identified IP address. 

Each of the servers in clusters 42, 44 is connected via a 
cable, like cable 58, to a redirector 60. The redirector in turn 
receives an input from a network connection 62, which in 
the present embodiment is an Internet connection. Redirec- 
tor 60 is a prior art hardware device that receives a source 
of traffic data hits — in the present case, via connection 
62 — and distributes them to the servers in clusters 42, 44. 

In the present implementation, redirector 60 distributes 
trafiSc data hits within each of clusters 42, 44. In other words, 
the traffic data hits generated as a result of access to the web 
site posted on cluster 42, are distributed among servers 46, 
48, 50. Similarly, traffic data hits produced by accessing the 
web site on cluster 44 are distributed among servers 52, 54, 
56. One device suitable for functioning as a redirector is 
manufactured by Cisco Systems and sold under the name 
LocalDirector. Those having skill in the art will appreciate 
that other known hardware devices can perform the function 
of redirector 60. 

Turning now to FIG. 5, log files 46 A, 48A, 50A, are each 
stored on the server corresponding to the numeral used to 
indicate the log file. These log files are generated and stored 
in the manner described in connection with the server of 
FIG. 1. In FIG. 5, the hits are numbered sequentially, hit 
number 1 through hit number 13 in the chronological order 
in which each traffic data hit was generated. In the depiction 
of FIG. 5, each of log files 46A, 48A, 50A is still being 
added to. That is in log file 46 A, for example, hit number 1 
is the first-stored data hit, and hit number 5 is the next-stored 
data hit, with hits numbers 6 and 12 being thereafter stored 
in sequence. Because log file 46 A is not yet full and it 
remains open, additional hits may be stored in sequence after 
hit number 12. The same is true for logs file 48 A, 50A. 

Turning now to FIG. 6, included therein is a sorter 64, 
which examines the hits in sequence in each of the log files 
and passes them — in the chronological order in which each 
hit was generated — to a log file analyzer 66. The. log file 
analyzer operates generally as described in connection with 
the server depicted in FIG. 1. Thereafter, analysis results are 
passed to analysis results 18A-C, also as described in 
connection with FIG. 1. 

llie operation of sorter 64 can best be understood with 
reference to the following Table 2 and to the flow diagram 
depicted in FIG. 8. 
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generated log files. This is advaniageous because it obviatesi 
TABLE 2 the need for separate routines or for configuring a program 

dependant upon whether the log files are consecutive or 
concurrent. In addition, the present invention is capable of 
5 sorting log files while they continue to receive and store new 
Irafiic data hits, lliis analysis on the fly provides users with 
statistical data and reports on a near real time basis. 

Numerous modifications and embodiments of the inven- 
tion will be obvious to those skilled in the art. Although the 
present invention has been described in terms of one 
embodiment, this should not be interpreted as limiting. 
Various alterations, modifications and combinations will no 
doubt become apparent to those skilled in the art after having 
read the above disclosure. Accordingly, the appended claims 
should be interpreted as covering all alterations and modi- 

'. fications that fall within the spirit and scope of the invention. 

We claim: 

First, in block 68 of FIG. 8, the first record received in j method for analyzing traffic data generated by a 

each log me 46A, 4«A, SOAis seM selection is ^^^^^ ^^^^ connected via a network to a 

depicted m Table 2, line 1, m which hits 1, 2, and 4 appear 20 - j • 

in the Compare column, in block 70. soner 64 compares P'«*l"y of «>mP>"">g devices compnsmg: 
each of hits 1, 2, and 4 and passes the oldest (in time) record, (a) generating a plurality of traffic data hits, each of said 
namely hit 1 (block 72). The routine next determines, in hits corresponding to a data packet exchanged between 

block 74, whether all the records in all of the log files have one of the servers and one of the computing devices; 

been selected compared and passed. If so. the routine ends associating the data hits with their respective servers; 

in block 76. If not, m block 78 the routine selects the next v / & r 

record in the log file containing the record that was passed W readmg a first data hit from each server; 
in block 72. In the example currently under consideration, (d) comparing the read data hits; 
the next record is hit number 5 in log file 46 A. Next — with (g) passing the oldest data hit' 

reference to line 2 of Table 2-in block 70. hits 5. 2 and 4 ^ j^,^ ^^^^ ^^^j^ 

arecompared.andhit2ispassed,itbemgtheoldest(mtime) 30 ^^^^ 6 ^^^^ ^.^ 
of the three records compared. ^ 

Because the routine operates in first-in, fiist-out (FIFO) (g) repeating (d) through (e) until all of the data hits are 
sequence on each of the log files, it can process while the ^ad; and 

files remain open and continue to receive additional hits in (h) analyzing the passed data hits, 
sequence. 35 2. The method of claim 1 wherein (b) and (c) are per- 

In the example of FIG. 7, log files 80, 82, 84 are operated formed substantially simultaneously, 
on by sorter 64. It should be noted that these log files include 3. The method of claim 2 wherein associating the data hits 
hits that are in sequential chronological order. What is more, with their respective servers comprises storing each data hit 
each of the log files is generated in chronological order. in a log file on its associated server, said traffic data hiLs 
Thus, log file 80 represents an identified time period, rang- 40 being generated in chronological sequence, and wherein 
ing between the time associated with hit 1 and the lime of hit different ones of said log files contain data hits correspond- 
4; log file 82, ranging between the times of hiu 5 and 8; and ^ j^affic data hits generated in the same time period, 
log file 84 between hits 9 and 12. With reference again to 4 ^^^^^ ^j^im 2 wherein said web servers mirror 
FIG. 8, and to Table 3, which depicts the sequential com- another 

parisonsm^adeon the log^file records in FIG J, hi^^^ ^^^^^ ^j^^ ^ ^^^^^.^ ^ ^^^.^j . 

9 are selected m block 68 and compared m block 70. Hit 1, . . . ^ ' ^ ^ 

the oldest record, is passed in block 72 and the next-in tormed prior to (c) . ,u a , 

record, hit number 2, ik selected in block 78. This sequence ^h^. "^^^^^ °^ ^ wherein associatmg the data hits 

continues unul all of hits 1 through 12 are passed, hits I their respective servers comprises stonng each data hit 

through 4 being first passed in sequence from log file 80, hits ^ a log file on its associated server, said traffic data hits 
5 through 8 being next passed in sequence fi-om log file 82, 50 being generated in chronological sequence, and wherem 
and finally hits 9 through 12 in sequence from log file 84. different ones of said log files contain data hits correspond- 
ing to traffic data hits generated in the same time period. 
TABLE 3 method of claim 5 wherein said web servers mirror 

— — — ' one another. 

Compare Passes 55 g, Jhe method of claim 1 wherein said web servers mirror 

one another. 

9. The method of claim 1 wherein associating the data hits 
with their respective servers comprises storing each data hit 
in a log file on its associated server. 
60 10, The method of claim 9 wherein said traffic data hits 
are generated in chronological sequence and wherein dif- 
ferent ones of said log files contain data hits corresponding 
to traffic data hits generated in the same time period. 
11. A method for analyzing log files containing a plurality 
65 of data hits in sequence, each of which corresponds to a 
llie present invention therefore properly sorts traffic data traffic data hit generated by a web server, said method 
hits in either concurrently-generated or consecutively- comprising: 
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(a) selecting the first data bit in each log file; 

(b) conaparing the selected data hits; 

(c) passing the oldest of the selected data hits to a log file 
analyzer; 

(d) selecting the next data hit in the log file in which the 
passed data hit was selected; and 

(e) repeating steps (b) through (d) until all of the data bits 
in the log files are passed. 

12. The method of claim 11 wherein said data bits are each 
associated with a unique time and wherein the last record in 
one file is associated with a time later than the first record in 
another file. 

13. The method of claim 12 wherein one of said log files 

is generated by a first web server and wherein another of said j5 
log files is generated by a second web server. 

14. The method of claim 11 wherein said log files are each 
associated with a unique time period and wherein the time 
for each data hit is within the period for its log file. 

15. The method of claim 11 wherein one of said log files 
is generated by a first web server and wherein another of said 
log files Ls generated by a second web server. 

16. A method for analyzing web-server log files compris- 
ing: 

generating traffic data hits representing actions on the web 25 
server; 

generating a data hit associated with each traffic data hit, 
each data hit being associated with a unique time; 

storing the data hits in a plurality of log files; 

sorting data hits from a plurality of the log files into 
chronological order; and 

analyzing the sorted data hits. 

17. The method of claim 16 wherein storing the data bits 

in sequence in a plurality of log files comprises alternating 35 
storing the data hits into different log files. 

18. The method of claim 17 wherein the data bits within 
each log file are stored in sequence. 
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19. The method of claim 18 wherein sorting data hits from 
a plurality of the log files into chronological order com- 
prises: 

(a) selecting the first data hit in each log file; 

(b) comparing the selected data hits; 

(c) passing the oldest of the selected data hits to a log file 
analyzer; 

(d) selecting the next data hit in the log file in which the 
passed data hit was selected; and 

(e) repeating (b) through (d) until all of the data bits in the 
log files are passed. 

20. The method of claim 19 wherein storing the data hits 
in a plurality of log files and sorting data hits from a plurality 
of log files into chronological order arc performed substan- 
tially simultaneously. 

21. The method of claim 19 wherein storing the data hits 
in a plurality of log files is performed prior to sorting data 
hits from a plurality of log files into chronological order. 

22. A system for analyzing web-server log files compris- 
ing: 

a source of traffic data hits generated by a web server; 
each of said data bits being associated with a unique 
time; 

a log file containing the data hits in sequence; 

a sorter for sorting the data hits from a plurality of the log 

files into chronological order; and 
an analyzer for analyzing the sorted data hits. 

23. The system of claim 22 wherein said sorter comprises: 
means for selecting a data hit in each log file; 

means for comparing the selected data hits; and 
means for passing the oldest of the selected data hits to the 
analyzer. 

* * « i# * 
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